Session 1-E

Multi-Armed Bandits

Conference
2:00 PM — 3:30 PM EDT
Local
Jul 7 Tue, 2:00 PM — 3:30 PM EDT

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

Zhida Qin and Xiaoying Gan (Shanghai Jiao Tong University, China); Jia Liu (Iowa State University, USA); Hongqiu Wu, Haiming Jin and Luoyi Fu (Shanghai Jiao Tong University, China)

2
The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ ∈ (0,1), with probability at least 1 − δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms and show that their sample complexities grow logarithmically as 1/δ increases. Moreover, compared to existing works, the running of our algorithms is independent of the arm-related parameters, which is more practical. In addition, we provide a fundamental lower bound for sample complexity of any algorithm under Bernoulli distributions and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of log1/δ. Finally, we validate our theoretical results through numerical experiments.

MABSTA: Collaborative Computing over Heterogeneous Devices in Dynamic Environments

Yi-Hsuan Kao (Supplyframe, USA); Kwame-Lante Wright (Carnegie Mellon University, USA); Po-Han Huang and Bhaskar Krishnamachari (University of Southern California, USA); Fan Bai (General Motors, USA)

1
Collaborative computing, leveraging resource on multiple wireless-connected devices, enables complex applications that a single device cannot support individually. However, the problem of assigning tasks over devices becomes challenging in the dynamic environments encountered in real-world settings, considering that the resource availability and channel conditions change over time in unpredictable ways due to mobility and other factors. In this paper, we formulate the task assignment problem as an online learning problem using an adversarial multi-armed bandit framework. We propose MABSTA, a novel online learning algorithm that learns the performance of unknown devices and channel qualities continually through exploratory probing and makes task assignment decisions by exploiting the gained knowledge. The implementation of MABSTA, based on Gibbs Sampling approach, is computational-light and offers competitive and robust performance in different scenarios on the trace-data obtained from a wireless IoT testbed. Furthermore, we analyze it mathematically and provide a worst-case performance guarantee for any dynamic environment, without stationarity assumptions. To the best of our knowledge, MABSTA is the first online algorithm in this domain of task assignment problems and provides provable performance guarantee.

Combinatorial Multi-Armed Bandit Based Unknown Worker Recruitment in Heterogeneous Crowdsensing

Guoju Gao (University of Science and Technology of China, China); Jie Wu (Temple University, USA); Mingjun Xiao (University of Science and Technology of China, China); Guoliang Chen (University of Science & Technology of China, China)

2
Mobile crowdsensing, through which a requester can coordinate a crowd of workers to complete some sensing tasks, has attracted significant attention recently. In this paper, we focus on the unknown worker recruitment problem in mobile crowdsensing, where workers' sensing qualities are unknown a priori. We consider the scenario of recruiting workers to complete some continuous sensing tasks. The whole process is divided into multiple rounds. In each round, every task may be covered by more than one recruited workers, but its completion quality only depends on these workers' maximum sensing quality. Each recruited worker will incur a cost, and each task is attached a weight to indicate its importance. Our objective is to determine a recruiting strategy to maximize the total weighted completion quality under a limited budget. We model such an unknown worker recruitment process as a novel combinatorial multi-armed bandit problem, and propose an extended UCB based worker recruitment algorithm. Moreover, we extend the problem to the case where the workers' costs are also unknown and design the corresponding algorithm. We analyze the regrets of the two proposed algorithms and demonstrate their performances through extensive simulations on real-world traces.

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

Arun Verma and Manjesh K Hanawal (Indian Institute of Technology Bombay, India)

4
In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.

Session Chair

Bo Ji (Temple University)

Session 2-E

Age of Information

Conference
4:00 PM — 5:30 PM EDT
Local
Jul 7 Tue, 4:00 PM — 5:30 PM EDT

AoI Scheduling with Maximum Thresholds

Chengzhang Li, Shaoran Li, Yongce Chen, Thomas Hou and Wenjing Lou (Virginia Tech, USA)

7
Age of Information (AoI) is an application layer performance metric that quantifies freshness of information. This paper investigates scheduling problems at network edge when each source node has a deadline (which we call Maximum AoI Threshold (MAT)). Specifically, we want to determine whether or not a set of MATs of the source nodes is schedulable, and if so, find a feasible scheduler for it. For a small network, we present an optimal procedure called Cyclic Scheduler Detection (CSD) that can determine schedulability with absolute certainty. For a large network where CSD is not applicable, we present a low-complexity procedure, called Fictitious Polynomial Mapping (FPM), and prove that FPM can find a feasible scheduler for any MAT set with the load smaller than ln 2. We use extensive numerical results to validate our theoretical results and show that the performance of FPM is significantly better than EDF.

Minimizing Age of Information in Multi-channel Time-sensitive Information Update Systems

Zhenzhi Qian and Fei Wu (The Ohio State University, USA); Jiayu Pan (Ohio State University, USA); Kannan Srinivasan and Ness B. Shroff (The Ohio State University, USA)

1
Age of information, as a metric measuring the data freshness, has drawn increasing attention due to its importance in many data update applications. Most existing studies have assumed that there is one single channel in the system. In this work, we are motivated by the plethora of multi-channel systems that are being developed, and investigate the following question: how can one exploit multi-channel resources to improve the age performance? We first derive a policy-independent lower bound of the expected long-term average age in a multi-channel system. The lower bound is jointly characterized by the external arrival process and the channel statistics. Since direct analysis of age in multi-channel systems is very difficult, we focus on the asymptotic regime, when the number of users and number of channels both go to infinity. In the many-channel asymptotic regime, we propose a class of Maximum Weighted Matching policies that converge to the lower bound near exponentially fast. In the many-user asymptotic regime, we design a class of Randomized Maximum Weighted Matching policies that achieve a constant competitive ratio compared to the lower bound. Finally, we use simulations to validate the aforementioned results.

On the Minimum Achievable Age of Information for General Service-Time Distributions

Jaya Prakash Varma Champati, Ramana Reddy Avula, Tobias J. Oechtering and James Gross (KTH Royal Institute of Technology, Sweden)

2
There is a growing interest in analysing the freshness of data in networked systems. Age of Information (AoI) has emerged as a popular metric to quantify this freshness at a given destination. There has been a significant research effort in optimizing this metric in communication and networking systems under different settings. In contrast to previous works, we are interested in a fundamental question, what is the minimum achievable AoI in any single-server-single-source queuing system for a given service-time distribution? To address this question, we study a problem of optimizing AoI under service preemptions. Our main result is on the characterization of the minimum achievable average peak AoI (PAoI). We obtain this result by showing that a fixed-threshold policy is optimal in the set of all randomized-threshold causal policies. We use the characterization to provide necessary and sufficient conditions for the service-time distributions under which preemptions are beneficial.

Unifying AoI Minimization and Remote Estimation --- Optimal Sensor/Controller Coordination with Random Two-way Delay

Cho-Hsin Tsai and Chih-Chun Wang (Purdue University, USA)

1
The ubiquitous usage of communication networks in modern sensing and control applications has kindled new interests on the coordination between sensors and controllers over channels with random delay, i.e., how to use the “waiting time” to improve the system performance. Contrary to the common belief that a zero-wait policy is always optimal, Sun et al. show that a controller can strictly improve the data freshness, the so-called Age-of-Information (AoI), by postponing transmission in order to lengthen the duration of staying in a good state. The optimal wait policy for the sensor side was later characterized in the context of remote estimation. Instead of focusing on the sensor and controller sides separately, this work develops the optimal joint sensor/controller wait policy in a Wiener-process system. The results can be viewed as strict generalization of the above two important results in the sense that not only do we consider joint sensor/controller designs (as opposed to sensor-only or controller-only schemes), but we also assume random delay in both the forward and feedback directions (as opposed to random delay in only one direction). In addition to provable optimality, extensive simulation is used to verify the superior performance of the proposed scheme in various settings.

Session Chair

Ana C Aguiar (University of Porto)

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.